lstm cell
Dense Video Captioning using Graph-based Sentence Summarization
Zhang, Zhiwang, Xu, Dong, Ouyang, Wanli, Zhou, Luping
--Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. T o address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages. For the "partition" stage, a whole event proposal is split into short video segments for captioning at a finer level. For the "summarization" stage, the generated sentences carrying rich description information for each segment are summarized into one sentence to describe the whole event. We particularly focus on the "summarization" stage, and propose a framework that effectively exploits the relationship between semantic words for summarization. We achieve this goal by treating semantic words as nodes in a graph and learning their interactions by coupling Graph Convolutional Network (GCN) and Long Short T erm Memory (LSTM), with the aid of visual cues. Two schemes of GCN-LSTM Interaction (GLI) modules are proposed for seamless integration of GCN and LSTM. The effectiveness of our approach is demonstrated via an extensive comparison with the state-of-the-arts methods on the two benchmarks ActivityNet Captions dataset and Y ouCook II dataset. ENSE video captioning, which aims at detecting all events and giving language descriptions in an untrimmed long video, is a very challenging problem in computer vision and has attracted a lot of research attentions recently. This task consists of two sub-tasks: 1) temporal proposal generation to localize the events and 2) video captioning to describe the events.
Applied Machine Learning Methods with Long-Short Term Memory Based Recurrent Neural Networks for Multivariate Temperature Prediction
This paper gives an overview on how to develop a dense and deep neural network for making a time series prediction. First, the history and cornerstones in Artificial Intelligence and Machine Learning will be presented. After a short introduction to the theory of Artificial Intelligence and Machine Learning, the paper will go deeper into the techniques for conducting a time series prediction with different models of neural networks. For this project, Python's development environment Jupyter, extended with the TensorFlow package and deep-learning application Keras is used. The system setup and project framework are explained in more detail before discussing the time series prediction. The main part shows an applied example of time series prediction with weather data. For this work, a deep recurrent neural network with Long Short-Term Memory cells is used to conduct the time series prediction. The results and evaluation of the work show that a weather prediction with deep neural networks can be successful for a short time period. However, there are some drawbacks and limitations with time series prediction, which will be discussed towards the end of the paper.
Differential Machine Learning for Time Series Prediction
Yadav, Akash, Nualart, Eulalia
Accurate time series prediction is challenging due to the inherent nonlinearity and sensitivity to initial conditions. We propose a novel approach that enhances neural network predictions through differential learning, which involves training models on both the original time series and its differential series. Specifically, we develop a differential long short-term memory (Diff-LSTM) network that uses a shared LSTM cell to simultaneously process both data streams, effectively capturing intrinsic patterns and temporal dynamics. Evaluated on the Mackey-Glass, Lorenz, and R\"ossler chaotic time series, as well as a real-world financial dataset from ACI Worldwide Inc., our results demonstrate that the Diff- LSTM network outperforms prevalent models such as recurrent neural networks, convolutional neural networks, and bidirectional and encoder-decoder LSTM networks in both short-term and long-term predictions. This framework offers a promising solution for enhancing time series prediction, even when comprehensive knowledge of the underlying dynamics of the time series is not fully available.
Reviews: Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks
The paper starts by showing empirically and theoretically that saliency maps generated using gradient vanishes over long sequences in LSTMs. The authors propose a modification to the LSTM cell, called LSTM with cell-attention, which can attend to all previous time steps. They show that this approach improve considerably the saliency on the input sequence. They also test their approach on the fMRI dataset of the Human Connectome Project (HCP). Originality: The proposed LSTM with cell-attention is a novel combination of well-known techniques.
Seq2Seq Model-Based Chatbot with LSTM and Attention Mechanism for Enhanced User Interaction
Benaddi, Lamya, Ouaddi, Charaf, Souha, Adnane, Jakimi, Abdeslam, Rahouti, Mohamed, Aledhari, Mohammed, Oliveira, Diogo, Ouchao, Brahim
A chatbot is an intelligent software application that automates conversations and engages users in natural language through messaging platforms. Leveraging artificial intelligence (AI), chatbots serve various functions, including customer service, information gathering, and casual conversation. Existing virtual assistant chatbots, such as ChatGPT and Gemini, demonstrate the potential of AI in Natural Language Processing (NLP). However, many current solutions rely on predefined APIs, which can result in vendor lock-in and high costs. To address these challenges, this work proposes a chatbot developed using a Sequence-to-Sequence (Seq2Seq) model with an encoder-decoder architecture that incorporates attention mechanisms and Long Short-Term Memory (LSTM) cells. By avoiding predefined APIs, this approach ensures flexibility and cost-effectiveness. The chatbot is trained, validated, and tested on a dataset specifically curated for the tourism sector in Draa-Tafilalet, Morocco. Key evaluation findings indicate that the proposed Seq2Seq model-based chatbot achieved high accuracies: approximately 99.58% in training, 98.03% in validation, and 94.12% in testing. These results demonstrate the chatbot's effectiveness in providing relevant and coherent responses within the tourism domain, highlighting the potential of specialized AI applications to enhance user experience and satisfaction in niche markets.
Probing for Consciousness in Machines
Immertreu, Mathis, Schilling, Achim, Maier, Andreas, Krauss, Patrick
This study explores the potential for artificial agents to develop core consciousness, as proposed by Antonio Damasio's theory of consciousness. According to Damasio, the emergence of core consciousness relies on the integration of a self model, informed by representations of emotions and feelings, and a world model. We hypothesize that an artificial agent, trained via reinforcement learning (RL) in a virtual environment, can develop preliminary forms of these models as a byproduct of its primary task. The agent's main objective is to learn to play a video game and explore the environment. To evaluate the emergence of world and self models, we employ probes-feedforward classifiers that use the activations of the trained agent's neural networks to predict the spatial positions of the agent itself. Our results demonstrate that the agent can form rudimentary world and self models, suggesting a pathway toward developing machine consciousness. This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.
Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs
Qian, Chao, Ling, Tianheng, Schiele, Gregor
To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning
Wang, Ziyu, Jiang, Wenhao, Zhang, Zixuan, Tang, Wei, Yan, Junchi
Abstract--Sequential processes in real-world often carry a combination of simple subsystems that interact with each other in certain forms. Learning such a modular structure can often improve the robustness against environmental changes. In this paper, we propose recurrent independent Grid LSTM (RigLSTM), composed of a group of independent LSTM cells that cooperate with each other, for exploiting the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability on the basis of the recent Grid LSTM for the tasks where some factors differ between training and evaluation. Specifically, at each time step, only a fraction of cells are activated, and the activated cells select relevant inputs and cells to communicate with. At the end of one time step, the hidden states of the activated cells are updated by considering the relevance between the inputs and the hidden states from the last and current time steps. Extensive experiments on diversified sequential modeling tasks are conducted to show the superior generalization ability when there exist changes in the testing environment. A certain patterns and characterizing real-world dynamic processes, such as component is corresponding to a certain part of the environment. Therefore, models adopt such reinforcement learning for intelligent agents [11], [12].